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Abstract 

We consider the set of finite sequences of length n over a finite or 
countable alphabet C. We consider the function defined over C" which 
gives the size of the maximum overlap of a given sequence with a (shifted) 
copy of itself. We compute the exact distribution and the limiting distri- 
bution of this function when the sequence is chosen according to a product 
measure with marginals identically distributed. We give a point-wise up- 
per bound for the velocity of this convergence. Our results holds for a 
finite or countable alphabet. The non-parametric distribution is related 
to the prime decomposition of positive integers. We illustrate with some 
examples. 

Running head The distribution of the overlapping function 
Subject class 60Axx, 60C05, 60-XX, 60Fxx, 41A25 

Keywords recurrence, overlapping, rare event, short return, first return, Renyi 
entropy. 



1 Introduction 

Consider a positive integer n. Consider the space of all sequences of length 
n defined over a finite or countable alphabet C. In this work we consider the 
function S'„ defined over C" and taken values on {0, . . . , n — 1}. For each string, 
this function gives the size of the maximum overlap of the string with a (shifted) 
copy of itself and zero if there is no overlap. See Definition 2.4. 

The function Sn is related to the first return function T„ that gives the 
minimum number of shifts we have to apply to the sequence in order to find an 
overlap witli a copy of itself through the formula 5'„ = n — T„ . 

The relevance of the first return function (and consequently of the overlap- 
ping function) was put in evidence in the statistical analysis of the Poincare 
recurrence. To prove convergence of the number of occurrences of a string (say 
of length n) as n diverges, to the Poisson distribution it is necessary that the 
string does not overlap itself [13]. Or at least, that the proportion that overlaps, 
with respect to n is small [3, 7]. If this is not the case, a compound Poisson 
distribution is the limiting law [12]. There are also some approximations for 
this limit [17, 18, 20[. 
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It also appears when we consider the time elapsed until the first occurrence 
of the string. This time is called the hitting time. It is known that the hitting 
time can be well approximated by an exponential law with parameter given by 
the measure; of the string [14]. But when the string overlaps itself, the parameter 
must be corrected by a factor which is the probability that the string does not 
appear twice consecutively. And this probability is given by the overlapping 
properties of the string [1, 2, 5, 10]. 

Yet, it appears when consider the return time instead of the hitting time. 
This law can also be well approximated by an exponential law with parameter 
being the measure of the string [14]. However when the string overlaps itself the 
limiting law is a convex combination of a Dirac measure at the origin and an 
exponential law [2, 5, 7]. As in the case of the hitting time, the parameter must 
be corrected by the above given factor. The weight of the convex combination is 
again this parameter. Surprisingly, when taking expectation (but not any other 
moment [2]) this parameter cancels. This fact is hidden when looking at Kac's 
Lemma ([15]). 

As far as we know, the first paper to notice that the measure of all strings 
that have large overlaps converges to zero was [9]. The authors proved the 
exponential decay of this measure when "large" means larger or equal than 
2n/3. That result holds for ■!/;- mixing processes with exponentially decaying 
function tjj and with finite alphabet. Later, the same was generalized in [1] to (f)- 
mixing processes. Here, "large" means larger or equal than a certain proportion 
Cn where C is a constant depending on the cardinal of the alphabet. 

Let us denote with T„(x") = n — S„(a;") the number of shifts needed to 
get the first overlap of an n-string .t" = {xi, . . . , .x„) with itself. It was proved 
in [21] using Kolmogorov complexity function and independently in [8] using 
Shannon, Mc-Millan & Breiman's Theorem that for a stochastic process over 
a finite alphabet, and with an ergodic measure fi with positive metric entropy 
satisfying the specification property [16], the ratio r„/n verifies 

liniinf^^i^^ = l , 

n— ^oo n 

for almost every sequence x = {xi, X2, ■ ■ ■)■ 

This result has also been proved for a class of non-uniformly expanding maps 
of the interval [14[ in the context of dynamical systems. 

Even when the definition of Sn (and T„) are purely combinatorial, it is 
interesting to have in mind an equivalent definition from the dynamical point 
of view. Fixed an n-string . the return time of over all infinite sequence 
y^, such that y" = (i.e., a cilinder indexed by x"), is defined explicitely as 

T,^{yr)=mi{t>2\ylX^ = x^} , 

(and infinite otherwise). Then 
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Namely, the first return (of the finite sequence x") function T„(a;") is the infi- 
mum of the return time of a;" over all the realizations of the process xf^ that 
have as initial condition .x". Thus, T„ is called the first return of (the n-string) 
in the dynamical literature. 

A large deviation principle for T„ was succesively proved in [4, 6, 11] for 

processes that verify different types of mixing conditions, including product 
measures, ergodic Markov chains, Gibbs measures with Holder continuous po- 
tential, etc . The limit of the deviation function is related to the Renyi entropy 
of the measure that generates the strings (see, for instance, [22] for definition 
and properties of the Renyi entropy). The existence of the Renyi entropies are 
also proved. 

Studying cclular automatas, [19] showed that for a counting measure over a 
finite alphabet, the proportion of strings with no overlap converges to a positive 
constant. 

Until now, nothing was known about the distribution of T„ and the existence 
of its limit reminded unknown. Since the sequence of random variables T„ are 
not tight, we are lead to consider instead Sn = n—Tn- In this work we consider a 
product measure P over C" with marginals identically distributed. Namely, the 
marginal of P is a probability function over C, which may be finite or countable. 
Thas is, the string a;" are generated by independent, identically distributed 
random variables. Each of this random variables has a probability distribution 
defined by a vector of parameters 9 = {pa)aGA lying in the parameter space 

e = = {pa)aeA I Pa > , ^ p« = 1} C (0, 1)^ . 

aeA 

Our main result read as follows: We present explicit expressions for the 
probability mass function of 5„ and also for its cumulate distribution P(S'„ > k). 
Moreover we show their convergence to a non-degenerated limiting distribution. 

The limiting probability mass function reads qk = ml*^ — bk where TO2 is 
the ^2-norm of the parametric vector 6, namely \/J2aGAPaJ ^^id bk is a smaller 
order term. Thus, the limiting distribution has an exponentially decreasing tail. 
We observe that, as in the aforementioned case of the large deviation of Sn, 
the probability of 5„ is also related to the Renyi entropy function Rh{I3), in 
this case at /3 = 1. We also present an explicit expression for the correction 
term bk- It is also related to the Renyi entropies, this time at positive integers 
13. We also show that a similar result holds for the cumulated distribution of 
S„. As an application, we show that for the uniform (counting) measure, the 
limiting measure of the non-overlapping strings {Sn = 0) is related to the prime 
decomposition of the positive integers. 

The dynamical definition of T„ (and therefore of Sn) allows us to think that 
this random variables are defined in the common space of infinite sequences. 
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Therefore one may ask about other types of convergences. We finish the pa- 
per showing that Sn does not converges in probabihty to any limiting random 
variable S. 

This paper is organized as follows. In Section 2 wc introduce some nota- 
tion and the basic definitions. In Section 3 we present our results and provide 
some examples. Section 4 presents some tools needed for the proofs. Section 5 
presents the proofs of our theorems. Finally, Section 6 shows that the conver- 
gence in distribution of 5„ can not be extended to convergence in probability. 

2 Notation and definitions 

We consider a probability product measure with identically distributed mar- 
ginals over a finite or countable alphabet C. 

The symbols of C are called letters. The set C which we index by a set A. 
We put Pa, a G A for the probability of these letters. To avoid non- interesting 
cases we assume that < < 1 for all a. Thus, the letters are generated by 
independent identically distributed random variables. 

A finite sequence of consecutive letters of length n, is called an n-string or a 
word of length n and is denoted with the letter w, or Wi or even Wij. When we 
need to describe specifically the letters of a finite or infinite sequence, namely 
{xa, • . • , Xb) with Xi G C and < a < & < oo, we write simply by x^. 

If Wi is a rii-string, i = 1, 2, • • • fc, with n = X]i=i write WiW2----'Wn for 

the n-string which consists in the concatenation of the rij-strings 'Wi,W2, ■■■■,Wk- 

The object of our analysis is the following. 

Definition 2.1. For a given string x" € C", the period or the first return of 
Xi, denoted by r„(a;"), is defined by the first self-overlapping position of the 
string. That is, Tn : C" — >■ {1, . . . , n} with 

T^{x^) = mm{k > l|xr' = (2) 

and Tn{xi) = n when the above set is empty. 

The fact that T„/n converges to one almost surely implies that T„ is not 
tight, therefore it is more convenient to consider the variables 5„ = n — T„ G 
{0, . . . , n — 1}. In this case we have that Sn/n converges to zero almost surely. 

Definition 2.2. We define 5„(x") as the maximum size of the self-overlap, 
among all the self-overlaps of the string x". Namely, 

5„(a;5^)=n-T„«) . 
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To study the level sets {5„ = k} or even the cumulated sets {Sn < k}, with 
fc G {0, . . . , n — 1} we will use as a tool the following sets. 

Definition 2.3. Let n be a positive integer. For every positive integer k < 
n, Bn{k) denotes the set of strings x" such that the first block of length k is 
whatever it is, but then this block is concatenated until to complete the n symbols. 
Namely 

X-^ = Xk^ . . . , iZ^l, . . . , Xfz^ iZ^l, . . . , Xr^ , 

^ V ' ^ V ' ^ V ' ^ V ' 

1 2 [„/feJ 1 

with < r < k. If [n/k\ = n/k, wich implies that r = 0, the last string is the 
empty string. 

We will also use the following definition. 

Definition 2.4. We set Rn{k) as the set of n- strings a;" e C" such that a;" has 

an overlap of size k. Namely 

It is easy to see that the following "duality" holds 

Bn{n - k) = Rn{k) Vfc = 1, . . . , n - 1 . (3) 

Finally we put 

Observe that m^'^ is the £q-norm of the parametric vector 9. Also we put 
p = max.{pa I a G A}, namely, the £00 norm of 0. 

Without lose of generality, we can think that the entries of 9 are disposed in 
non-decreasing order, say: 9 = {pi,P2,P3, ■ ■ ')> where p = pi > P2 > Pa > ■ ■ ■ ■ 

3 Results 

In our main theorem we show that the cumulate distribution and the prob- 
ability mass function of 5„, for strictly positive integers (namely fc ^ 0), can 
be written as a geometric term plus a correction term. The parameter of the 
geometric term is given by m2. We show also a similar result for the limit- 
ing cumulate and mass distribution functions. Finally, we present a velocity of 
convergence for the convergence. 

To state precisely our result we need to introduce some quantities that will 
appear in the theorem as correction terms. The first two are related to the 
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distribution of 5„. The last two are the hmits of the previous ones, and are 
related to the limiting distribution of 5„. Let 

/ n-l Ln/2J-1 \ L"/2J-1 / i-1 \ 

afe,„=P U Rn{j)\ U RnU)\+ Yl P[R2^{^)\\JR2^iJ)\ , 

\j=ln/2} j=k J i=k+l \ j=k J 

and 

/ n-l L"/2J-1 \ L"/2J / i- 

bk,n = P U Rn{j) n Rn{k)\ \J + P R2i{i) H R2i{k)\ [j i?2i(i) | • 

y=Ln/2j j=k+l J i=k+l \ j=k+l 

Further 

oo / i—1 

ak= ^ P R2i(i)\ U R2i{j) 

i=k-\-l \ j=k 

and 

oo I i—\ 

bk= Yl ^[R2i{i)nR2i{k)\ [j i?2i(j) 

i=k+l \ j=k+l 

Now we state our main result. 

Theorem 3.1. Let V be a product measure over with marginals identically 
distributed. Then, for all positive integer k and all n>2k 

a) V{Sn >k) = m!^ + ak,n ■ 

b) V{Sn = k)^m\- bk,n ■ 

c) lim T{Sn >k)=m2+ak ■ 

n— >oo 

d) lim P(S'„ = k) =ml-bk . 

n— >oo 

Furthermore, for all 2n > Ak one has P(52n >k)= P(S'2„+i > k) and P(52n = 

k)=F{S2n+l = k). 

The next corollary establishes what is the measure and the limiting measure 
of the set of strings with non-overlap, or simply the set of "self-avoiding words" 
[19]. 

Corollary 3.1. Under the hypothesis of Theorem 3.1 one has 

a) f{Sn = 0) = 1 - 7712 - ai,n , Vn > 2. 

OO 

b) lim P(5„ = 0) = 1 - m2 - y V PH^ > (1 - pi)(l - m2) . 

i=2 we{Si=0} 



6 



Furthermore, the sequence (P(6'2n = 0))„£n is decreasing. More precisely 

nS2n = 0) = P(52„-2 = 0) - Yl • 

we{Sn=o} 

Remark 3.1. By the last statement of Theorem 3.1, P(52n+i = 0) = P(<S'2n = 
0) for all n. 

The next theorem provides the exponential rate of convergence of our main 
theorem. 

Theorem 3.2. For every non-negative integer k and every positive integer n > 
Ak the following inequalities hold 

a) \¥{Sn = k)~ lim„^o„ ¥{Sn = k)\< Cm^' 

b) >k)- lim„^oo P(5„ > < Cm^^' 

where C is a positive constant (that depends only on vector 0). 

The next proposition presents bounds for afe_„, 6fc_„, afc, 6fc. 
Proposition 3.1. Under the hipothesis of Theorem 3.1 one has 

a) ak,n < ("^2^^ - "^2 )/(! - ^2) ■ 

b) bk,n < (m^i)/(l - m2) + (2m^/'+V(™2 - P^)) {ms/ml^y . 

c) ak < m^+V(l - m2) . 

d) bk < m$+V(l - m2) . 

The bounds in the proposition above do not establishes which one is the 
leading term between TOj and ak,n or au in Theorem 3.1. The next proposition 
shows that actually, both situations can happen. (It is obvious that > 
max„>2fe{&fe,„,&fc}.) 

The next proposition shows us that the bound presented in Proposition 3.1c) 
is sharp. Moreover, it shows that, if m2 < 1/2, {m2)ken is the leading term. If 
m2 > 1/2, the sequence {m2)keN starts above the sequence {ak)keN^ and then 
its tail becomes strictly smaller. 

Proposition 3.2. Under the conditions of heorem 3.1, there exists A{k)(that 
satisfies: lim;j_^oo -^{k) = such that a{k) > m^'^^ /{'^ — TO2) — A{k). Further- 
more 

a) If m2 < 1/2, then ruk > Ok for all k gN. 




/ \'' 3/2 

( "'3 I "'2 

I 372 I 3/2 

\ J ms— mg 
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b) //to2 > 1/2, then 

1. ai < m2 

2. There exists some ko > 0, for wick Ok > for all k > ko. 

3.1 Examples 

To exemplify the behavior of m2, afe_„, 6fe^„, a^, 6^ we present some examples. 

Example 3.1 (Two letters alphabet). Consider the case where C = {0, 1} and 
6 = {p,l — p) = (pi,P2); where pi = p. Then, mi = p\ +P2, for i e N, and the 
inequalities given by Proposition 3.1 become 

a) ak,n < — 



1 - Pi - P2 

, X , . {pi+pi)''+' ^ 2{pi+pir+' ( pi+pi 

bj bk,n. I_p2_p2 + p2 \(pl+plfn 

{pl+plf+' 



c) ak < 



d) bk < 

l-vi 



1 - PI - P2 

2 2 ■ 

Pi -P2 



/n c), notice that if p > 1/2, i/ien m2 > 1/2, and item 6)2. of Proposition 
3.2 (ok > m|; /loZrfs /or all k>ko, where ko = \ log |/| log ^ | . 

Example 3.2 (Uniform measure). In this example we consider a uniform prod- 
uct measure over the finite alphabet C = {1, . . . , s}, so that 6 = (1/s, . . . , 1/s). 
Then, mi = s (l/s') = l/s^~^. Thus mi = l/s^~^. The inequalities given by 
Proposition 3.1 become 



S L 

^ sn(s-l)' - sfc(s-l) 



(3fe+2) 



s — IV / s— 1 

5t/ Proposition 3.2a), we have that in the uniform case, m2 is always the 
leading term. 

The proportion of words of lenght n with no overlap is 



s - 1 



s 



n-1 n/2-1 \ n/2-1 / j-1 

U Rr.{j)\ U Rn{j)]- J2 Ph?2.«\U^2i(j)| • (4) 
U=n/2 j=l J i=2 y j=l 
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Further 

n-l n/2-1 n-l n/2-1 

y j?„(j)\ y Rn{j) = y Rn{j)\ y RnU) . 

j=n/2 j=l j=l j=l 

By Lemma 4-4 

n-l n/2 
y RnU) = y Rn{j) . 

Thus the leftmost probability in (4) is 

(n/2-1 
Rn{n/2)\ y 

that can he added to the rightmost term in (4)- Thus 

_ n/2 / i-l 

P(5„ = 0) = ^ R2i(i)\ y R2i{j) 

^ i=2 \ j=l 

Similarly, the limiting proportion of words with no overlap is exactly 
Since F{S2i+i = 0) = P{S2i = 0), the last expression becomes 



s s ■f— f s' 

1=1 



Moreover 



P(52„ = 0) = P(52„-2 = 0) - ^¥{S„ = 0) . 



4 Tools for the proofs 

Before proving our main theorem, we prove a number of preparatory lemmas. 
Firstly, we recall the following classical notation. For a positive integer x we 
write lx\ for the largest integer smaller or equal than x. Similarly, we write \x] 
for the smallest integer larger or equal than x. 

Lemma 4.1. Letp > 1, q > 1- Then 



rriqp < mP 
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Proof. Since mg^' is the Cq norm of the vector 6, a classical Cq inequality gives 

mq, = {my/n"' < {my'^r = K • 

□ 

The following lemma is a tool to present explicit computations for the prob- 
ability P(S„(,7)). 

Lemma 4.2. The following equality holds for every positive integers j and £ 

wed 

Proof. For each w G one has 

¥{w) = JJ J^i" , where ^ = j ■ 

Thus 
Thus 

□ 

The next lemma says that, the total measure of the n-strings that have small 
overlap rcmainds the same if we "cut" the central letters of the strings. 

Lemma 4.3. Let k < [n/2j - 1. Then 

'Vn/2\-l \ /Vn/2\-l \ 

U Rn{3) \ =P U %Ln/2j-l)0) • 
j=k J \ j=k I 

Proof, w = Xi ^ [jj=k^^^ Rn{j) if and only if there exists a j such that k < 
j < [n/2j - 1 and a;? G -R„(i). Thus 

W = W1W2W1 , 

where wi is a j-string and W2 is an n — 2j-string and they are independent. Now 
we write W2 = 1^2,2^2,3 where u;2,2 is the central word of 'W2, of length 2 in 
the case that n is even or of length 3 in the case that W2 it is odd. Namely 

rn/2l+l 
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and ■W2.1 and ■W2,3 are words of length [{n — 2j)/2\ — 1. Now, define w 
WiW2,iW2,3'Wi € -R2[n/2j-i(j)) which is independent of W2,2- Tlius 



'L«/2J-1 \ 

U = Yl 

W2,lWl)P(«^2,2) • 

«'l«'2,l£C"~^ J«2,2GC* 

Summing independently each term, the first term sums up to -R2[ra/2j-iO) and 
the second one sum up to one. □ 

The next lemma says that the total measure of the set of n-strings with large 
overlap, goes to zero exponentially fast. 

Lemma 4.4. The following holds 



'\n/2] \ / n-l \ 



n L"/2J 
2 ■•-2 



J=i / V=L»/2J ; 

Proof. The equality follows by duality. To prove the inequality, firstly we have 

/[n/21 \ r»/21 
Still, if w e Bn{j), then we can write n = j [n/jj + r where <r < j. Thus 

W = WjWj...Wj Wr ; Wj GC^ , Wr & C. 

^ V ' 

[n/j] times 

Therefore, by Lemma 4.2 

P(s„(j)) < E PK-)L"/^'Jp'' = • 

By Lemma 4.1 

m, , - 1 < Too 

1 /2 

Observe that p < ■ Thus, the sum in (6) is bounded from above by 

\n/2^ 

E i^yy = LiJ-2/' • 

□ 
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Lemma 4.5. The following holds 

n-l fc=rn/2] 
|Ji?„(fc)= IJ Rn{k). 
k=l k=l 

Proof. Let k > ^. If uj £ Rn{k) = Bn{n — k), so w = wiwi ■ • -w^, with 
n = 



n 

-J 



J , - r, with r being the size of Wr(wich could be 0), for some integer 

Th K 

j. If r = 0, we have that u overlaps in(at least) a wi string. If r > 0, we have 
that uj overlaps in(at least) a ujr string. In both cases, we have a smaller overlap 

than n/2, and it proves that [Jfc=i Rn{k) € Ufc=i"^^^ Rn{k), and this concludes 
the proof. □ 

5 Proofs 

5.1 Proof of Theorem 3.1 

Proof of Theorem 3.1. For short hand notation put 



G„(fc)=P(5„>A;)=p| |Ji?„0-) 
We first consider the case when n is even. By a simple decomposition 



Gn{k) = P I y Rn{j) 



7/2-1 \ / n-l n/2-1 

p| U ^"(•?) U U ^"(■?) 

i=k I \j=n/2 j=k 



By Lemma 4.3 the left most term in the last expression is equal to 

''n/2-l 

U ^«-2(j) 

j=k 

We can rewrite the last probability as 

'(n-2)-\ \ / (n-2)-l n/2-1 

IJ ii„_2(j) -P U Rn-2{j)\ U Rn-2{j) 
j=k ) \ j=n/2 j=k 
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The left most term, by defintion, is equal to Gn-2{k) . Thus, we conclude that 

Gn{k) = G„_2(fe) 

A similar argument shows that 

Gn-2{k) = G„_4(fc) 

-n^%~n/2~-\^n-A{j)\ ^^^^ Rn-^j)). 

Thus 



Gn{k) = Gn-4{k) 

-nU%-%-'Rn-2ij)\ Rn-2{j)) 
+n^t~n%~-\Rn-2{j)\ RnMj)) 
-nU%-%-_\Rn-4ij)\ U;/^' Rn-4j))- 

Solving the two lines in between we get that they are equal to 

P(i?„_2(n/2 - 1)\ u;/^' Rn-2U)) ■ 
A recursive argument up to k gives 

Gn{k) = G2k{k) 

+n^';Zn/2Rnij))\^%V RnU)) 
n/2-1 

+ ^ p(i?2^(i)\u;rii?2.(j)) 

i=k+l 

-P(U|i^i,i?2fe(j)\^Ui?2fe(i)). 
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Computing the first and last term on the right hand side of the above equality, 
it gives 



(n-l n/2-1 \ n/2-1 / i-1 

U U ^"(j) + E u R2i{j) 
j=n/2 j=k J i=k+l \ j=k 

(7) 

This proves a) since F{R2k{k)) = m^. 

Further, since the second term on the rigth hand side goes to zero as n 
diverges, by Lemma 4.4, we coclude that 



oo 

lim Gn{k) = F{R2k{k)) + V P(i?2i(i)\ U}"^^ i?2i(i)) • (8) 

i=fe+l 

This proves c). 

For the probability mass function we have 

P(5„ = fc) = G„(fc)-G„(fc + l) . 

And solving this equation using (7) we get that P(5„ = fc) is equal to 



P(i?2fc(fc))-P(E2(fc+i)(/e + l)) 

n-l n/2-1 
y Rn{j)nRn{k)\ y RnU) 
\j=n/2 j=k+l 
n/2 / i-1 

- ^ P i?2i(i)ni?2,(A;)\ y i?2zO-) 
i=fe+2 y j=fe+l 

k 

+p(i?2(fe+i)(fc+ 1)\ y ^2(fe+i)(j)) • 

j=k 



Computing the right most term in the first line with the last line in the above 
display, the result is 

-F{R2(k+i){k + 1) n R2^k+i){k)) . 

Considering, with some abuse of notation, that the union running over an empty 
set of indexes is the empty set, we finally get that 
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P(5„ = fc) = ¥{R2k{k)) 



U Rn{j)nRn{k)\ U Rn{j)] (9) 
\j=n/2 j=k+l 

- ^[R2i{i)nR2i{k)\ \J i?2.0) 

i=k+l \ j=k+l 



This shows b). 

By Lemma 4.4, term (9) goes to zero as n diverges. Thus, the hmit 

lim ¥iSn = k) , 

n— )-oo 

exists and is equal to 



nR2k{k))- ^\R2i{i)^R2i{k)\ U R2i{j) \ . (10) 

i=fe+l \ j=fe+l 

This shows d). 

If n is odd, the above argument changing n/2 by [n/2j holds. We conclude 
that for any positive integer n we have G2n+\{k) = G2n{k) and F{S2n+i = k) = 

nS2n = k). 



□ 



5.2 Proof of Corollary 3.1. 

Note that 

F{Sn = 0) = 1 - ¥{Sn > 1) = 1 - ma - ai,„ 

and similarly 

lim V{Sn = 0) = 1 - m2 - ai . 

n— )-cso 

But 



ai=5^P(i?2i(i)\U}zli?2i(j)) 



i=2 
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The set in the probabihty in each term is the set of words ww where w is an 
i-word without any self-overlap. Namely 

i-l 

R2i{i)\ y R2i{j) = {ww\we {Si = 0}} . 

This establishes the equality in b). 

Now we prove the last formula. In what follows, the first inequality is just 
by definition, second one is by Lemma 4.5 and the third one is a simple decom- 
position. 



G„(l) = 



U ^r^ij) 
1/2 

U ^"0 ) 



n/2-1 

p|i?„(n/2)\ U Rn{j) 



By Lemma 4.3, the leftmost term in the last display is equal to P ^UjZ^ ^ Rn-2 (j)^ 

But applying again Lemma 4.5, this last probability equals to P ^uj=T^^~^-^n-2(j) 

which is G„_2(l)- 

It is straightforward to see that 



n/2-1 \ 

i?„(n/2)\ y ii„(j) = Yl ^( 

J = l / we{Sn/2=0} 



Thus we conclude that 



nsn = 0) = p(5„_2 = 0) - 



It remains to show the strict inequality in h). By the above argument, the 
probabihty of the set of n-strings with some overlap is increasing on n. Further, 
the above displays shows that 

p(^„ > 1) = p(^„-2 > 1) + ^(^)' • 

Now call pi and p2 the two largest Pa , with a G A (allowing multiplicities among 
the Pa, tht is A is considered a multi-set, thus it may happen that pi = p2)- 
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That is pi = max{pc« | a € ^} and p2 ~ max{pQ, | a G A\q:o} where p^a = Pi- 
It foUows that, if w e {Sn = 0} then P('u;) < p\~^P2- Thus, it follows from the 
last display that 

P(5„ > 1) < P(5„_2 > 1) + P(5„/2 = ^)vT'^Vi • 
Since P(5'„ = 0) is decreasing, 

P(5„ > 1) < P(5„_2 > 1) + P(52 = 0) p"/'" V2 . 
An iterative argument shows that 

ri/2-1 

P(5„ > 1) < P(52 = 1) + P(52 = 0) ^ pip2 . 

j=i 

And 

lim P(5„ > 1) < P(52 = 1) + P(S2 = 0)/i^ . 

n-s-oo V — P\ 

Since P2 ^ ^ — Pi we conclude that 

lim ¥{Sn > 1) < P(^2 = l)+PlP{S2 = 0) , 
n— >oo 

observing that P(S'2 = 1) = m2- 

5.3 Proof of Theorem 3.2. 

It follows by Theorem 3.1 that 

\nSn = k)- lim P(5„ = k)\ = \bk,n -bk\, 
n—>oo 

which is bounded from above by 



00 / n-l n/2-1 

max^ J2 P(i?2i(i)ni?2i(fc)),P y i?„(i)ni?„(fc)\ y 

=n/2+l Vj="/2 j=k+l 



(11) 



Consider firstly the first term in (11). If an n-string w belongs to R2i{i) Ci R2i{k) 
then it has the form 

W = WiW2Wi^ViW2Wi , 

where wi is a fc-string and W2 is an i — 2fc-string. Therefore 

F{w) = P(wi)^P(u;2)^ , 
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and thus 



i-2k 
2 



¥ {R2i{i) n R2i{k)) = F{wi)^ Yl nw2f=m'lm. 
Summing over i, we get that the first term in (11) is bounded by 



)k n/2+1 
^ . (12) 
1 — m2 

Consider now the second term in (11). By duaUty, the probability in b) is 
equivalent to 



i/2 n-k-1 

\jB^{j)nRn{k)\ y i3„(j)| . (13) 
Since, by definition, Bn{j) C Bn{l) for all I multiple of j one has 

n-k-l n/2-k n-k-1 

y B^{j) = y B^j) u y b^u) . 

j=n/2+l j=l j=n/2+l 

Thus, the set in (13) is equal to 

n/2 n-k-1 
y Bn{j)nRn{k)\ y BnU). 
j=n/2-k+l j=n/2+l 

The above expression implies that it is enough to bound 



j P(S„( 



%/2-k/2 n/2 

Y + Yl \nB„{j)nRn{k))=i+ii 



U=n/2-fe+l j=n/2-k/2+l 



Consider /. Since 2j < n — k and w G Bn{j) then there arc at least two 
complete blocks of length j at the beginning of w, and the remaining part of w 
has length at least k. Thus, we can write 

W = WbWbWi . 

Further, since w € Rn{k), the first and last block of length k are equal. Thus 

W = WkWmWk ■ 
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The last two descriptions of w imply that 

W = lUilU2WiW2W3'Wi . 

where wi has length k and W2 has length j — k. Moreover, ws has length 
n — 2j — k. Thus, factorizing the measure of w we have 

F{w) = P(«;i)^P(w;2)^P(w;3) • 
Recall that p = Taax{pa \ a G ^4}. Thoroforc 

Summing over j we have 

(n-fe)/2 / \k 

^ P(i?„(i)ni?„(fc))<«/M ^ , (14) 

j=n/2-k+l \"^2 / 

where Cg = 7712/ (m2 — p^)- Finally, observe that m-ijrrSJ^'^ < 1 is equivalent 
to ms^/^ < rri^'^ which is true by Lemma 4.1. 

Consider //. Take w = G Bn{j) Ci Rn{k) . Since w € Rn{k) one has 

W = WkWmWk ■ 

Since blocks can be read forward or backward, every peace of the string is also 
periodic (that is, the central peace is in -Bn-2fe(j))- So, we can recopilate this 
and write 

W = W1W2W1W2W3W1 . 

The length of wi is k. The length of W2 is n — 2k — j and the length of Ws is 
2j + k — n. Factorizing the measure of w we have 



F{Bn{j) n Rn{k)) 



< 



k n-2k-j 2]+k-n 



P(W2)V^^+''"" (15) 



(16) 



Summing over j we have 

n/2 



F{Rr,{j)nRn{k))<C'em;^ , 

j=(„-fe)/2+l \"^2 . 
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where C'g = {p/m2f ■ This ends the proof of //. 

So, as C0 > C'g, take C = 2Co To end the proof of b) we need to show that 
the right hand side of (12) is less or equal than (14). To this, observe that this 
is equivalent to show that 1714 < m^m^^^ . But 

and 

This ends the proof of a). 

The proof of 6) follows directly from a) summing up the error terms in a). 

□ 

5.4 Proof of Proposition 3.1. 

We first prove c). We can write 



/ 2A--1 oc \ 

+ E nR2^i^)\ u}-i i?2i(j)) = /+//. 

\i=k+l i=2fe/ 



As in the proof of the first term in (11) with n = Akwe get 



^ 1712 J 1 ~ ™2 1 — ''712 



By a direct computation one has 



j _ "^2 "^2 

1 - m2 



i=k+l i=fe+l 

Thus, c) follows since m^^ < m^. 
Proof of d). 

(2fc-l 00 \ 

E + E ^ (^2i(i) n R2i{k)) = 1 + 11 . 
i=k+l i=2k/ 

As we computed in the proof of an upper bound for (11) when proving Theorem 
3.2, II is 

00 ^ 



m'l 777.2 = '^4 



l-'^^ 
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For the leading term in I we note that 

R2S) n R2i{k) = [ww I w e Bk+j{j)] , 
with j = i — fc.Thus, for 1 < j < A; — 1 we compute 

where k + j = [^j^J j + r and < r < j — 1. We conclude that 
Therefore 



7<^^^!!!i^. (17) 

1 — 7714 



Proof of a) and b). Similar computations of those done in the proof of c) 
and d) can be done to get an upper bound for the second term in ak^n and 6fc,„. 
The second term in ak,n is bounded by 

n/2-1 n/2-1 n/2 



''2 

i=fe+l i=fe+l 



1 - m2 



and the first one by 



^ p(i?„(i)) = E "^2 

i=n/2 i=n/2 



ra — 1 n/2 r> 
mo —Too 



1 — m2 



Thus, ak^n < - ^2)7(1 " "^2). 

The first term in bk^n was bounded in the proof of Theorem 3.2, equation 

(11) by CeTO2^^ (^T^) • ^^^^^"^ 

one is bounded as was done bk above. □ 

5.5 Proof of Proposition 3.2. 

a) follows directly from Proposition 3.1 c). 

Now we prove 6.1). By Corollary 3.1 b), we have ai < pi(l — TO2) < 1 — TO2 < 
m,2- Last inequality follows since TO2 > 1/2. 

Now we prove the first sentence and also 6.2). By definition 
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i=k-\-l V j=k 

oo oo i—1 

= ^ P - ^ {R2i{i) n U i?2i(i)) 

i=fc+l i=k-\-l j=k 

QQ OO 2 X 

= E "^2- E F(i?2.(i)nU^2.(j)) 

i—k-\-l i—k-\-l j—k 

Bounding the union by the sum we get 



oo oo i—1 

ak > E "^2- E J2^iR2i{i)f]R2i{j)) 

i—k-\-l i=k-\-l j=k 

fc+1 oo oo 

j=k z=j+l 



where the equahty was obtained by using Fubini's Theorem. Now, let's take a 
look at the last term on the previous equation, it can be written as 

oo 2j — 1 oo oo 

p(i?2,(i)n^2.(i))+EEp(^2i(i)n^2i(i))=/+// 

j=ki=j+l j=ki=2j 



Term / is bounded as in (17): 



' 1 — m4 y 1 — 7714 



For the second one we have 



oo oo 



II = EE E^hM E n-f 

j=ki=2j VcjGCJ / \a;eC*-25 

oc oo 

= EE '^i'^2~^^ 

j=k i=2j 

nA \ ( 1 
1 — 7714 / I 1 — mi 
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So, we have 

m2 _ A{k) \ 
1 — m2 J ^ 



where 



k 



A{k) = -^[mi + -^] >I + II. 



Clearly, liuik^^ A{k)/m2 = 0, and also limfe_>oo = 0. Putting on the most 
left side the lower bound given by Theorem 3.1, we have that 



A{k) <ak< 



1 — m2 1—7712 

and this proves sharpness. To prove 6.2), we just have to notice that since, by 
hypothesis, m2/(l — 1712) > 1, then there is some ko for which: 

m2 A{k) ^ ^, , 
> l,Vfc > ko- 



1 — m2 m| 

And this concludes the proof. □ 

6 Non-convergence in probability 

In this section wc show that 5„ docs not converges in probability when n 
goes to infinity. Recall that since we are considering non-trivial cases, we have 
p<l. 

Proposition 6.1. Under the conditions of Theorem 3.1, there is not a random 
variable S over such that Sn converge in probability to S. 

Proof. Suppose that Sn converges to S in probability. Then, for all e > 

lim F{\Sn+i-Sn\ < e) = 1 . 
Consider e < 1. Since, by definition Sn = n — Tn one has 

{\Sn+i - Sn\ <e} = {\Tn - T„+i + 1| < e} . 
Since T„ is non decreasing and takes only positive integer values 

~ Tn\ < e} = {Tn+l — Tn} = {Sn+1 ~ Sn + 1} • 

Conditioning on {T„ = k} we get 

This ends the proof. □ 
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